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Abstract 


Evaluating  the  reliability  of  maturity  level  ratings  is  crucial  for  providing  confidence  in  the 
results  of  software  process  assessments.  This  report  examines  the  dimensions  underlying  the 
maturity  construct  in  the  Capability  Maturity  Model®  (CMM®)  for  Software  (SW-CMM)  and 
then  estimates  the  internal  consistency  (reliability)  of  each  dimension.  The  analysis  is  based 
on  676  CMM-Based  Appraisal  for  Internal  Process  Improvement  (CBA IPI)  assessments 
conducted  during  the  period  of  January  2000  through  April  2002.  The  results  suggest  that  the 
SW-CMM  maturity  is  a  three-dimensional  construct,  with  “Project  Implementation”  repre¬ 
senting  the  maturity  level  2  key  process  areas  (KPAs),  “Organization  Implementation”  repre¬ 
senting  the  maturity  level  3  KPAs,  and  “Quantitative  Process  Management”  representing  the 
KPAs  at  both  inaturity  levels  4  and  5.  The  internal  consistency  for  each  of  the  three  dimen¬ 
sions  as  estimated  by  Cronbach’s  alpha  exceeds  the  recommended  value  of  0.9.  Although 
more  should  be  learned  about  the  distinctions  between  maturity  levels  4  and  5,  the  internal 
consistency  of  those  KPAs  is  comparable  to  those  at  levels  2  and  3. 


®  Capability  Maturity  Model  and  CMM  are  registered  in  the  U.S.  Patent  and  Trademark  Office  by 
Carnegie  Mellon  University. 
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1  Introduction 


The  Capability  Maturity  Model®  (CMM®)  for  Software  (SW-CMM)  is  both  a  reference 
model  for  appraising  software  process  maturity  and  a  normative  model  for  helping  software 
organizations  progress  along  an  evolutionary  path  from  ad  hoc,  chaotic  processes  to  mature, 
disciplined  software  processes  [Paulk  et  al.  93a-93c].  The  CMM-Based  Appraisal*  for  Inter¬ 
nal  Process  Improvement  (CBA  DPI)  is  an  assessment  method  to  perform  reliable  and  consis¬ 
tent  assessments. 

The  SW-CMM  is  one  of  the  best-known  and  most  widely  used  models  of  its  kind.  Thousands 
of  organizations  have  performed  SW-CMM  assessments  and  the  resources  expended  on  SW- 
CMM-based  software  process  improvement  (SPI)  are  estimated  to  be  in  the  billions  of  dollars 
[Herbsleb  et  al.  97].  Hereafter  this  study  interchangeably  uses  the  two  terms  SW-CMM  as¬ 
sessment  and  CBA  IPI  assessment  unless  there  is  reason  to  distinguish  them. 


Appraisal  results  have  been  used  as  a  basis  for  many  important  decisions,  including  actions  to 
improve  internal  software  processes,  large-scale  acquisitions,  and  contract  monitoring.  For 
example,  CMM  level  3  has  been  encouraged  for  potential  contractors  of  the  U.S.  Air  Force 
[Saiedian  &  Kuzara  95,  Coffman  &  Thompson  97].  Given  the  importance  of  the  decisions 
influenced  by  appraisals  and  the  resources  required  to  implement  them,  both  contractors  and 
acquirers  must  be  confident  in  the  appraisal  results. 


1 .1  The  Reliability  of  Process  Assessments 

Increased  confidence  in  assessment  results  can  be  achieved  by  demonstrating  the  reliability 
of  assessment  procedures.  Reliability  is  defined  as  the  extent  to  which  the  same  measurement 


®  Capability  Maturity  Model  and  CMM  are  registered  in  the  U.S.  Patent  and  Trademark  Office  by 
Carnegie  Mellon  University. 

^  Appraisal  is  a  generic  term  used  to  refer  to  the  diagnostic  method  independent  of  the  context  and 
motivation  for  its  application.  The  term  “appraisal”  covers  both  assessment  and  evaluation.  As¬ 
sessment  has  come  to  connote  the  use  of  a  diagnostic  method  for  internal  process  improvement 
(e.g.,  self-examination)  purposes.  Evaluation  has  come  to  connote  the  use  of  a  diagnostic  method 
to  provide  insight  to  a  separate  (typically  external)  organization,  frequently  for  purposes  of  acqui¬ 
sition  or  contract  monitoring  [Dunaway  96].  The  purpose  for  its  use  may  be  different,  but  the  di¬ 
agnostic  method  itself  is  quite  similar  in  both  instances.  Since  our  study  uses  a  dataset  from  ap¬ 
praisals  for  internal  process  improvement,  we  intentionally  use  the  term  assessment. 
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procedure  yields  the  same  results  on  repeated  trials  [Carmines  &  Zeller  79].  Lack  of  reliabil¬ 
ity  is  caused  by  measurement  error. 

Similar  to  any  other  measurement  procedure,^  it  is  crucial  to  estimate  the  amount  of  meas¬ 
urement  error  in  a  SW-CMM  assessment  in  order  to  provide  confidence  in  the  trustworthi¬ 
ness  of  its  results.  Measurement  is  defined  as  “the  process  of  linking  abstract  concepts  to  em¬ 
pirical  indicants”  [Blalock  68].  The  abstract  concepts  (theoretical  constructs)  are  neither 
directly  measurable  nor  observable,  but  can  only  be  estimated  by  empirical  indicants  (indica¬ 
tors,  items,  measures,  scale,  or  variables).  In  SW-CMM  context,  maturity  is  an  abstract  con¬ 
cept  that  is  indirectly  measured  by  using  key  process  areas  (KPAs). 

So  what  would  be  the  desirable  qualities  of  measures?  They  are  reliability  and  validity^  [Car¬ 
mines  &  Zeller  79].  Reliability  concerns  the  degree  of  repeatability  and  consistency  of 
empirical  measurements  [Zeller  &  Carmines  80].  The  amount  of  random  measurement  error** 
is  inversely  related  to  the  degree  of  reliability  of  the  measuring  instrument.  Any  measuring 
instrument  is  relatively  reliable  if  it  is  minimally  affected  by  random  measurement  error, 
where  the  term  “instrument”  implies  a  questionnaire,  assessment  procedure,  or  any  other 
form  of  data  collection  that  is  used  in  rating  software  engineering  practices.  A  set  of  KPAs  is 
a  type  of  instrument  to  collect  data  for  measuring  the  maturity  of  organizations. 


The  more  consistent  the  results  given  by  repeated  measurements,  the  higher  the  reliability  of 
the  measurement  procedure.  The  consistency  of  the  measurements  is  affected  by  ambiguities 
in  wording  and  inconsistencies  in  interpretations  by  assessors  [El-Emam  &  Goldenson  95, 
Fusaro  et  al.  98].  A  survey  of  process  assessments  based  on  ISO/IEC  15504^  shows  that  clar¬ 
ity  of  the  semantics  of  the  process  definition  in  the  15504  document  set  is  a  third  important 
variable*  among  24  that  affect  reliability  [El-Emam  et  al.  97].  Recent  studies  also  show  that 
more  reliable  assessments  can  reduce  assessment  effort  during  consolidation  [El-Emam  et  al. 


^  Assessors  rate  whether  or  not  KPA  goals  are  achieved.  One  or  more  goal  ratings  are  combined  to 
determine  the  level  of  KPA  satisfaction.  In  turn,  those  measures  of  KPA  satisfaction  are  combined 
to  determine  the  capability  maturity  of  a  software  organization.  Thus,  both  KPA  satisfaction  and 
maturity  levels  are  in  fact  derived  measures  [ISO  01]. 

^  Validity  is  defined  as  the  extent  to  which  any  instrument  measures  what  it  is  intended  to  measure. 
In  other  words,  validity  is  related  to  accuracy,  whereas  reliability  is  related  to  repeatability  and 
consistency.  The  notion  of  validity  relates  to  “the  assumption  that  measures  of  theoretic  concepts 
should  behave  similarly  toward  theoretically  relevant  external  variables”  [Balch  74].  See  Zeller 
and  Carmines  for  general  theory  [2^11er  &  Carmines  80]  and  El-Emam  and  Birk  for  validity  stud¬ 
ies  in  software  process  assessments  [El-Emam  &  Birk  OOa-OOb]. 

*  Systematic  error  does  not  affect  reliability  but  affects  validity.  Since  this  study  is  limited  to  reli¬ 
ability,  systematic  error  is  not  considered  in  this  analysis. 

*  ISO/IEC  15504  {Software  Process  Assessment)  is  a  suite  of  international  standards  under  devel¬ 
opment  by  Working  Group  10  of  Subcommittee  7  (Software  Engineering  Standardization)  under 
Joint  Technical  Committee  1  for  the  International  Organization  for  Standardization  (ISO)  and  the 
International  Electrotechnical  Commission  (lEC)  [ISO  98]. 

*  The  two  most  important  factors  in  that  study  are  “Lead  assessor’s  experience/competence  in 
conducting  assessment”  and  “Lead  assessor’s  knowledge  of  ISO/IEC  15504  or  WGIO 
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98,  Jung  et  al.  01].  In  addition,  questionnaire-based  studies  of  SW-CMM  assessment  team 
leaders  and  team  rnembers  indicate  that  the  consolidation  of  data  is  one  of  the  most  difficult 
aspects  of  assessment  [Dunaway  &  Baker  01]. 

Skepticism  does  remain  in  our  field,  both  about  the  value  of  process  improvement  in  general 
and  the  credibility  of  assessment  results.  Indeed,  some  critics  have  argued  that  little  or  no 
evidence  exists  [Fayad  &  Laitinen  97],  while  others  have  expressed  concerns  about  the  reli¬ 
ability  of  appraisal  results  [Bollinger  &  McGowan  91,  Gray  &  Smith  98].  The  critics  are  cor¬ 
rect  that  credible  evidence  is  vital.  Other  such  evidence  does  exist,  however,  and  it  will  be 
reviewed  more  fully  later. 


1 .2  Study  Purpose  and  Summary  of  Results 

The  objective  of  this  report  is  to  identify  the  dimensions  underlying  a  set  of  SW-CMM  KPA 
measures,  and  to  then  estimate  the  internal  consistency  (reliability)  of  each  dimension  of  the 
capability  maturity  concept.  Internal  consistency  is  estimated  using  Cronbach’s  alpha  [Cron- 
bach  51],  which  is  considered  to  be  an  appropriate  method  in  the  context  of  software  process 
assessment  and  is  commonly  used  in  empirical  software  engineering.  The  results  are  based  on 
maturity  level  data  from  676  CBA IPI  assessments  conducted  during  the  period  from  January 
2000  through  April  2002.  To  the  authors’  knowledge,  this  is  the  fiirst  study  of  its  kind  to  in¬ 
vestigate  the  dimensionality  of  organizational  capability  maturity  and  estimate  internal  con¬ 
sistency  by  using  the  results  of  full-scale  SW-CMM  assessments. 

The  report  should  serve  to  reduce  concerns  about  reliability  in  SW-CMM  assessments.  Using 
factor  analytic  techniques,  it  is  shown  that  the  concept  of  capability  maturity  can  be  separated 
into  three  distinct  dimensions.  The  maturity  level  2  ICPAs  are  in  fact  closely  related  as  the 
SW-CMM  model  suggests,  and  they  can  be  treated  as  a  single  “Project  Implementation”  di¬ 
mension.  Similarly,  the  maturity  level  3  KPAs  can  be  considered  together  as  like  aspects  of 
“Organization  Implementation.”  The  maturity  level  4  and  5  KPAs  are  combined  under  a  sin¬ 
gle  dimension  termed  “Quantitative  Process  Implementation.”  The  values  of  Cronbach’s  al¬ 
pha  coefficient  of  internal  consistency  are  higher  than  the  recommended  value  of  0.9  [Fusaro 
et  al.  98,  Nunnally  &  Bernstein  94]. 

Section  2  provides  a  brief  overview  of  the  SW-CMM  within  the  scope  of  the  report  and  a 
brief  theoretical  background  of  the  reliability  theory  focused  on  internal  consistency  in  soft¬ 
ware  process  assessment.  The  section  also  describes  previous  studies  of  reliability  in  software 
process  assessments.  Section  3  addresses  data  collection,  sampling  characteristics,  and  analy¬ 
sis  methods.  The  results  of  the  analysis  are  presented  in  Section  4.  Further  discussion  and 
conclusions  are  in  Section  5. 
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2  Background  and  Literature  Review 


2.1  Appraisals  of  the  CMM  for  Software 

Process  appraisals  may  be  done  for  different  reasons.  CBA IPI  assessments  typically  are  in¬ 
tended  to  motivate  organizations  to  initiate  or  continue  software  process  improvement  pro¬ 
grams  as  well  as  to  provide  an  accurate  picture  of  the  organization’s  maturity  relative  to  the 
SW-CMM.’  Software  Capability  Evaluations  typically  are  used  for  source  selection  or  con¬ 
tract  monitoring.  Although  our  results  are  based  only  on  CBA  IPI  assessments,  the  issues  of 
reliability  that  we  discuss  apply  to  all  appraisals  regardless  of  the  purposes  for  which  they  are 
intended. 


2.2  Estimating  Measurement  Reliability 

There  are  a  variety  of  reliability  estimation  methods,  such  as  test-retest,  alternative-form, 
split-half,  and  internal  consistency*  (Cronbach’s  alpha)  [Zeller  &  Carmines  80].  This  study 
uses  Cronbach’s  alpha  because  it  is  known  to  be  the  most  appropriate  method  for  measuring 
reliability  in  software  process  assessments  [El-Emam  &  Goldenson  00,  Jung  &  Hunter  01].  It 
also  is  the  most  commonly  used  method  in  the  software  engineering  community.  Appendix  B 
presents  further  theoretical  concepts  and  empirical  research  methods  for  evaluating  reliability 
in  software  process  assessments. 


2.2.1  Cronbach’s  Alpha 

In  discussing  the  reliability  of  measurements,  a  set  of  items  (indicators)  is  posited  to  reflect 
an  underlying  construct.  In  the  SW-CMM,  maturity  that  is  neither  directly  measurable  nor 
observable  can  be  indirectly  measured  by  considering  the  assessed  values  of  the  KPAs.  We 
can  say  that  the  SW-CMM  uses  an  18-item  (or  KPA)  instrument  to  measure  the  maturity  of 


^  Those  less  familiar  with  CMM  models  and  appraisal  methods  should  see  Appendix  A  for  a  fuller 
review  of  the  SW-CMM,  assessments,  and  maturity  level  determination. 

^  In  some  studies,  internal  consistency  includes  average  inter-item  correlation,  average  item-total 
correlation,  split-half,  and  Cronbach’s  alpha  [Trochim  01].  However,  this  study  uses  internal  con¬ 
sistency  synonymously  with  Cronbach’s  alpha. 
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organizations.  If  the  necessary  data  were  readily  available,  we  also  could  use  a  52-item  (goal) 
instrument.’ 


The  type  of  scale  used  in  most  measurement  instruments  is  a  summative  one  [Mclver  & 
Carmines  81,  Spector  92].  This  means  that  the  individual  ratings  x,  s  for  each  item  are 

summed  up  to  produce  an  overall  rating  score,  i.e.,  y  =  ,  where  N  is  the  number  of 

items  in  an  instrument.  One  property  of  the  covariance  matrix  for  a  summative  rating  is  that 
the  sum  of  all  terms  in  the  matrix  gives  exactly  the  variance  of  the  scale  as  a  whole,  i.e., 

(tJ  =  X  ’  where  cr^.  denotes  covariance  between  items  i  and  j ;  if  /  =  7 ,  then  =  af  . 
The  variability  in  a  set  of  items  score  is  considered  to  consist  of  two  components  as  follows: 


N 

•  The  error  terms  are  the  source  of  unique  variation  that  each  item  possesses,  i.e.  . 

/si 

•  The  signal  component  of  variance  that  is  considered  to  be  attributable  to  a  common 
source  due  to  capability  maturity  is  the  difference  between  total  variance  and  unique  vari- 

N  N 

ance,  i.e.  •  Thus,  the  ratio  of  true  to  observed  variance  is  (crj ■ 

/=1  /=1 

To  express  this  in  relative  terms,  the  number  of  elements  in  the  covariance  matrix  of  a  sum¬ 
mative  rating  must  be  considered.  The  total  number  of  elements  in  covariance  matrix  is  , 
and  the  total  number  of  communal  elements  is  -N  .  Thus,  Cronbach’s  alpha  becomes: 


N 

(N-l) 


Np 

or  a  = - - - , 

l  +  p(N-l) 


where  N  is  the  number  of  items;  erf  and  af,  are  a  unique  variation  of  item  i  and  total  varia¬ 
tion,  respectively;  p  is  equal  to  the  mean  inter-item  correlation. 


Cronbach’s  alpha  is  a  generalization  of  Kuder-Richardson  formula  number  20  (KR20)  to  es¬ 
timate  the  reliability  of  items  scored  dichotomously  with  zero  or  one  [Kuder  &  Richardson 
37].  KR20  is  computed  as  follows: 


KR20  = - 

A^-1 


where  N  is  the  number  of  dichotomous  items;  p,  is  the  proportion  responding  “positively” 
to  the  item  i ;  (rf.  is  equal  to  the  variance  of  the  total  composite.  KR20  has  the  same  interpre¬ 
tation  as  Cronbach’s  alpha. 
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The  SW-CMM  includes  52  goals,  with  20,  17,  6,  and  9  goals  in  maturity  levels  2,  3, 4,  and  5,  re¬ 
spectively. 
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Since  KPAs  in  SW-CMM  assessments  are  determined  dichotomously  as  zero  (“Not  Satis¬ 
fied”)  or  one  (“Fully  Satisfied”),  this  study  can  use  KR20  without  making  any  assumption 
about  rating  scale  type.  However,  if  a  maturity  or  capability  level  is  measured  with  a  scale 
that  uses  more  than  two  categories,  such  as  in  ISO/IEC  15504,  reliability  estimation  does  re¬ 
quire  assumptions  about  scale  type.  Since  KR20  gives  the  same  value  as  Cronbach’s  alpha, 
this  study  uses  the  more  popular  term  Cronbach’s  alpha  rather  than  KR20. 

What  constitutes  a  satisfactory  Cronbach’s  alpha  value  level  of  reliability  depends  on  how  a 
measure  is  being  used.  In  the  early  stages  of  the  research  on  an  assessment  instrument,  reli¬ 
abilities  of  0.7  or  higher  are  considered  sufficient  for  narrow  constructs  [Cronbach  51,  Nun- 
nally  &  Bernstein  94]  and  0.55  to  0.7  for  moderately  broad  constructs  [Van  de  Ven  &  Ferry 
80].  For  basic  research,  a  value  of  0.8  is  acceptable.  In  applied  settings  where  important  deci¬ 
sions  are  being  made  with  respect  to  assessment  scores,  a  reliability  of  0.9  is  the  minimum 
that  would  be  acceptable  [Nunnally  &  Bernstein  94]. 

Since  maturity  levels  are  in  fact  used  in  making  important  decisions,  the  minimum  tolerable 
value  of  internal  consistency  in  the  SW-CMM  should  be  set  at  0.9.  In  ISO/IEC  15504,  the 
minimum  value  also  has  been  set  at  0.9  [Fusaro  et  al.  98]. 


2.2.2  Dimensionality 

Cronbach’s  alpha  assumes  that  the  construct  being  measured  is  unidimensional  [Carmines  & 
Zeller  79].  As  the  name  implies,  unidimensional  scaling  is  relevant  to  those  situations  in 
which  it  is  presumed  that  there  exists  a  single  dimension  underlying  a  set  of  data  items 
[Mclver  &  Carmines  81].  In  contrast  to  unidimensional  models,  multidimensional  scaling 
implies  that  there  is  more  than  a  single  dimension  that  underlies  a  set  of  items.  If  the  SW- 
CMM  maturity  scale  were  multidimensional,  then  it  would  be  more  appropriate  to  compute 
the  internal  consistency  for  each  dimension  separately. 

Two  factor-analytic  models,  factor  analysis  and  principle  component  analysis,  can  be  used  to 
investigate  the  dimensionality  of  process  attributes.  The  objective  of  factor  analysis  is  to 
search  for,  identify,  or  confirm  the  underlying  factor(s)  or  construct(s)  and  to  explain  the  cor¬ 
relation  among  items.  The  objective  of  principle  component  analysis  is  to  reduce  the  number 
of  variables  to  a  few  components,  where  each  of  the  components  can  be  represented  as  a  lin¬ 
ear  combination  of  the  corresponding  variables  [Sharma  96]. 


The  scree  plot  [Cattell  66]  and  the  eigenvalue-greater-than-one-rule’®  [Kaiser  70]  are  the  two 
most  popular  methods  to  determine  the  number  of  factors  (or  components).  The  scree  plot  is 
a  figure  of  the  eigenvalues  against  the  factor  numbers,  ^ttell  recommended  retaining  factors 


An  eigenvalue  represents  the  sum  of  squared  factor  loadings  for  all  of  the  items  in  that  factor.  See 
Section  4. 1 .2  for  an  example. 
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above  the  “elbow”  and  rejecting  those  below  it.  The  eigenvalue-greater-than-one-rule  retains 
only  the  factors  that  have  the  eigenvalue  of  greater  than  1.  The  rationale  of  this  rule  is  that  the 
amount  of  variance  extracted  by  each  factor  should,  at  a  minimum,  be  equal  to  the  variance 
of  at  least  one  variable.  Since  the  rule  may  lead  to  a  greater  or  a  fewer  number  of  factors  than 
are  necessary  [Cliff  88],  it  is  recommended  that  this  rule  be  used  in  conjunction  with  other 
rules  [Sharma  96]. 

Factor  loadings  are  the  correlations  between  the  items  and  the  factors  (underlying  constructs). 
The  loadings  are  criteria  to  determine  the  quality  of  factor  classifications.  Sharma  recom¬ 
mended  a  cut-off  value  of  0.6  [Sharma  96].  Comrey  provided  a  guideline  of  factor  loading 
such  as  “fair”  (0.45),  “good”  (more  than  0.5),  “very  good”  (0.63),  and  “excellent”  (0.71) 
[Comrey  73]. 


2.3  Previous  Studies  of  Reiiabiiity  in  Process 
Assessments 

2.3.1  Internal  Consistency 

Most  studies  of  reliability  in  software  process  assessments  have  been  conducted  as  part  of  the 
Software  Process  Improvement  and  Capability  dEtermination  (SPICE)  Trials”  of  the  emerg¬ 
ing  International  Standard  ISO/IEC  15504.  Two  important  objectives  in  the  Trials  were  to 
evaluate  reliability  and  validity  of  ISO/IEC  15504  conformant  assessments  [El-Emam  & 
Goldenson  95].  The  Trials  team  developed  a  study  plan  and  concept  of  reliability  in  software 
process  assessment  based  on  ISO/IEC  15504.  The  Phase  2  SPICE  Trials  team  published  a 
summarized  final  result  [Jung  et  al.  01]. 

Results  from  the  Phase  2  SPICE  Trials  reported  high  internal  consistency;”  however,  it  be¬ 
came  necessary  to  investigate  any  changes  in  reliability  in  the  subsequent  version  of  the  Pro¬ 
posed  Draft  Technical  Report  (PDTR).  Recently,  Jung  reevaluated  the  internal  consistency  of 
the  ISO/IEC  TR  15504  capability  dimension  [Jung  02a].  Results  fi'om  assessments  done  in 
Korea  again  showed  a  high  Cronbach’s  alpha  value  of  0.89  for  capability  levels  1  through  3, 
which  is  particularly  noteworthy  since  the  assessments  were  done  using  the  English  language 
version  of  the  TR. 


"  The  SPICE  Trials  were  performed  in  three  broad  phases.  Phase  1  took  place  in  1995,  Phase  2  from 
September  1996  to  June  1998,  and  Phase  3  began  in  July  1998.  In  November  2001,  SPICE  Trials 
was  reshaped  with  the  name  of  SPICE  Network.  SPICE  Network  consists  of  SPICE  Research, 
SPICE  Benchmarking  Forum,  and  SPICE  Network  Partner  [ISO/WGIO  01].  Empirical  studies  at 
Phase  2  were  published  [ISO/WGIO  98  &  99]. 

”  Interrater  agreement  also  has  been  used  to  estimate  the  reliability  of  software  process  appraisals. 
Estimates  are  made  of  the  extent  to  which  two  assessors  or  teams  of  assessors  agree  when  making 
independent  judgments  about  the  same  software  engineering  processes.  See  Appendix  B  for  fur¬ 
ther  information. 


8 


CMU/SEI-2002-TR-037 


The  first  reliability  study  of  the  1987  maturity  questionnaire  was  presented  10  years  after  its 
publication  [Fusaro  et  al.  98].  Data  for  the  study  were  from  a  Delphi  panel  and  a  mail  survey. 
Fusaro  et  al.  estimated  an  internal  consistency  of  0.94  by  utilizing  Cronbach’s  alpha,  higher 
than  the  0.90  reported  earlier  by  Humphrey  and  Curtis.  The  internal  consistency  of  the 
ISO/IEC  PDTR‘^  15504  capability  dimension  was  estimated  at  0.90  in  the  same  study.  More¬ 
over,  Fusaro  and  his  colleagues  assumed  that  capability  and  maturity  are  unidimensional  con¬ 
structs.  If  they  were  multidimensional,  their  results  would  be  deflated  estimates. 

Another  SW-CMM  study,  based  on  a  goal-level  questionnaire  administered  to  45  projects  in 
one  large  company,  resulted  in  a  Cronbach’s  alpha  of  0.70  [Rrishnan  &  Kellner  99].  How¬ 
ever,  they  used  a  5-point  rating  scale  instead  of  the  dichotomous  scale  typically  used  with  the 
SW-CMM.  Moreover  they  did  not  provide  information  about  the  dimensionality  of  their  data. 
As  just  noted,  Cronbach’s  alpha  assumes  unidimensionality.  Their  result  is  all  the  more  com¬ 
pelling  if  there  is  in  fact  more  than  one  separate  dimension  of  capability  maturity. 


2.3.2  Dimensionality 

Curtis  collected  questionnaire  data  on  SW-CMM  KPA  goals  covering  maturity  levels  2  and  3 
from  3  organizations  and  performed  principle  component  analysis  for  each  organization  [Cur¬ 
tis  96].  His  results  showed  a  multidimensional  construct  that  included  what  he  termed  “plan- 
fulness,”  coordinated  commitments,  subcontractor  management,  quality  assurance,  configura¬ 
tion  management,  and  process  definition.  Clark  conducted  a  correlational  study  of  levels  2 
and  3  CMM  KPAs  using  data  collected  from  50  organizations  [Clark  97].  El-Emam  and 
Goldenson  performed  a  principal  component  analysis  based  on  the  Clark  results  and  found  a 
multidimensional  factor  structure  that  differed  somewhat  from  Curtis’  results  [El-Emam  & 
Goldenson  00]. 

The  differences  in  the  results  of  these  studies  arise  from  differences  in  their  questions  and 
data  collection  methods.  The  Curtis  study  is  based  on  several  individuals  within  the  same 
organization.  Clark’s  study  is  based  on  organizational-level  data.  Individual  level  design  is 
useful  for  reliability  studies,  but  the  unit  of  observation  ideally  should  be  organization  or  pro¬ 
ject  rather  than  individual  [El-Emam  &  Goldenson  00].  In  addition,  Curtis  examined  KPA 
goal  satisfaction,  while  Clark  studied  the  implementation  of  KPAs.  As  will  be  seen  in  Section 
4,  both  sets  of  results  are  different  from  our  own,  which  are  based  on  a  much  larger  and  var¬ 
ied  dataset. 


ISO/IEC  JTCl  has  a  variety  of  paths  for  developing  International  Standards  [ISO  99].  One  of 
them  is  through  a  published  technical  report  (TR).  ATR  follows  a  series  of  stages  such  as  NP 
(New  Proposal),  WD  (Working  Draft),  PDTR  (Proposed  Draft  Technical  Report),  DTR  (Draft 
Technical  Report),  TR  (Technical  Report),  and  IS  (International  Standard).  Assessments  in  the 
Phase  2  SPICE  Trials  were  based  on  the  PDTR  version.  At  the  time  of  writing,  ISO/EEC  15504 
was  still  in  the  second  draft  stage  of  TR  (sometimes  called  TR2). 
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El-Emam  investigated  the  dimensionality  of  the  ISO/IEC  15504  capability  scale  [El-Emam 
98].  He  described  two  dimensions  underlying  a  set  of  process  attributes.  The  first  dimension 
“Process  Implementation,”  consists  of  the  process  attributes  in  capability  levels  1  through  3. 
The  second  dimension,  “Quantitative  Process  Implementation,”  covers  the  process  attributes 
in  capability  levels  4  and  5.  Jung  and  Hunter  analyzed  a  total  of  691  process  instances  as¬ 
sessed  during  the  Phase  2  SPICE  Trials  (from  September  1996  to  June  1998)  [Jung  &  Hunter 
02].  They  reconfirmed  the  multidimensionality  of  the  ISO/IEC  15504  capability  measures 
and  provided  Cronbach’s  alpha  values  of  0.88  and  0.87  for  the  two  dimensions,  respectively. 
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3  Research  Method 


3.1  Data  Source 

3.1.1  Data  Collection 

Lead  assessors  authorized  by  the  Software  Engineering  Institute  (SEI^”^)  are  required  to  pro¬ 
vide  reports  to  the  SEI  for  their  completed  assessments.  Assessment  data  on  the  reports  are 
kept  in  an  SEI  repository  called  the  Process  Appraisal  Information  System  (PAIS).'‘*  The 
PAIS  includes  information  for  each  assessment  on  company  and  appraised  entity,  KPA  pro¬ 
files,  organization  and  project  context,  functional  area  representatives  groups,  findings,  and 
related  data. 

Submitting  an  assessment  report  does  not  imply  that  the  SEI  certifies  any  assessment  findings 
or  maturity  levels.  All  assessment  data  are  kept  confidential  and  are  available  only  to  SEI 
personnel  on  a  need-to-know  basis  for  research  and  development.  Information  in  the  PAIS  is 
used  to  produce  industry  profiles  or  as  aggregated  data  for  research  publications,  and  the  SEI 
publishes  a  Maturity  Profile  report  twice  a  year  (http://pcaf/PAIS/). 

The  dataset  that  was  analyzed  for  this  study  was  extracted  from  appraisal  reports  in  the  PAIS 
for  the  period  of  January  2000  through  April  2002.  During  the  period,  948  appraisals  were 
reported  to  the  SEI,  707  (74.58%)  of  which  were  CBA IPI  assessments  of  the  SW-CMM.  The 
remaining  appraisals  cover  a  variety  of  appraisal  models  and  methods  such  as  SW-CMM 
Software  Capability  Evaluation  (SCE),  CMM  for  Software  Acquisition  (SA-CMM),  and 
CMM  Integration®'^  (CMMI®). 


Not  all  CBA  IPI  assessments  include  KPA  rating  profiles,  since  the  determination  of  a  matur¬ 
ity  level  or  KPA  ratings  is  optional  and  is  provided  at  the  discretion  of  the  assessment  spon¬ 
sor.  KPA  ratings  and  maturity  levels  exist  for  676  assessments,  including  362  from  organiza¬ 
tions  in  the  U.S.  and  314  from  non-U.S.-based  organizations. 


CMM  Integration  and  SEI  are  service  marks  of  Carnegie  Mellon  University. 

The  data  entry  form  can  be  found  in  http://seir.sei.cmu.edu/ROE/. 

®  CMMI  is  registered  in  the  U.S.  Patent  and  Trademark  Office  by  Carnegie  Mellon  University. 
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Also  note  that  Software  Subcontract  Management  is  excluded  from  the  analyses,  since  that 
maturity  level  2  KPA  was  not  assessed  by  74.56%  (504/676)  of  the  organizations  in  the  sam¬ 
ple. 


3.1 .2  Dataset  Analyzed 

Many  assessments  do  not  consider  all  KPAs  up  to  and  including  maturity  level  5.  As  seen  in 
Table  1,  four  overlapping  datasets  were  created  based  on  the  KPAs  that  were  included  in  the 
model  scope  of  the  676  assessments.  By  definition,  all  of  the  maturity  level  2  KPAs  were  as¬ 
sessed  in  all  676  organizations.  Those  that  assessed  only  the  level  2  KPAs  are  removed  from 
the  second  dataset,  leaving  408  that  assessed  the  KPAs  through  maturity  level  3  or  higher. 
Similarly,  156  assessments  covered  KPAs  through  level  4  or  higher,  and  78  covered  the  KPAs 
through  levef5.  Reading  across  the  first  row  in  Table  1,  note  that  93, 295, 176, 60,  and  52  of 
the  assessments  were  determined  to  be  at  maturity  levels  1, 2, 3, 4,  and  5,  respectively.  Read¬ 
ing  down  the  columns,  note  also  that  26  of  the  93  organizations  that  were  determined  to  be  at 
maturity  level  1  also  were  assessed  against  the  level  3  KPAs.  Two  of  them  also  were  assessed 
against  the  level  4  KPAs,  and  one  included  the  level  5  KPAs  in  the  model  scope  of  its  as¬ 
sessment.  Similarly,  94  of  those  determined  to  be  at  level  2  also  were  assessed  unsuccessfully 
against  some  or  all  of  the  level  3, 4  or  5  KPAs.  The  same  interpretation  applies  for  the  or¬ 
ganizations  that  were  determined  to  be  at  maturity  levels  3  and  4.  Of  course,  all  of  those  who 
were  determined  to  be  at  level  5  fully  satisfied  the  goals  of  all  of  the  KPAs. 


Table  1:  Number  of  Organizations  at  Each  Maturity  Level 


Dataset 

ML’l 

ML  2 

ML  3 

ML  4 

MLS 

Total 

Dataset  1:  ML  2  KPAs’  rating 

93 

295 

176 

60 

52 

676 

Dataset  2:  ML  2-3  KPAs’  rating 

26 

94 

176 

60 

52 

408 

Dataset  3:  ML  2-4  KPAs’  rating 

1 

9 

34 

60 

52 

156 

Dataset  4:  ML  2-5  KPAs’  rating 

1 

7 

10 

8 

52 

78 

*  ML  denotes  maturity  level. 


We  investigated  the  dimensions  of  maturity  and  then  estimated  internal  consistency  for  each 
dimension  separately  for  each  of  the  four  datasets.  Consistent  results  from  the  four  datasets 
provide  increased  confidence  in  our  conclusions. 


3.1 .3  Unit  of  Analysis 

In  CBA IPI  assessments,  several  projects  are  assessed  in  a  single  organization.  The  KPA  pro¬ 
files  for  an  organization  are  the  aggregate  of  assessment  team  judgments  across  those  projects 
to  produce  a  single  maturity  level  for  the  entire  organization  within  the  scope  of  the  assess¬ 
ment.  Thus,  the  unit  of  analysis  in  this  study  is  an  organization. 

Our  dataset  consists  of  KPA  rating  profiles  from  676  organizations.  In  experimental  terms, 
each  of  the  676  organizations  becomes  a  case.  Each  case  includes  scores  for  that  case  on  one 
or  more  attributes,  where  an  attribute  is  defined  as  some  characteristic  of  the  case  and  the 
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score  is  a  value  of  the  attribute.  For  instance,  each  assessment  reported  to  the  SEI  becomes  a 
single  case,  in  which  the  case  includes  determinations  (scores)  of  KPAs  (attributes). 


3.2  Sampling  Characteristics  of  the  Dataset 

Statistical  analyses  and  interpretations  of  the  data  at  hand  depend  on  the  selection  of  a  sample 
(subset)  from  a  population.  Population  inference  requires  random  sampling.  Thus,  we  first 
examine  the  sampling  characteristics  of  our  dataset. 


The  simplest  form  of  sampling  is  a  (simple)  random  sample.  The  random  sample  is  defined 
as  “a  set  of  cases  selected  from  a  well-defined  population  of  cases  by  a  process  that  ensures 
that  every  sample  containing  the  same  number  of  cases  has  the  same  chance  of  being  the  one 
selected”  [Lunneborg  00].  In  the  SW-CMM  assessment  context,  this  definition  explicitly  im¬ 
plies  two  requirements:  1)  a  well-defined  population  of  assessment  cases  from  which  to  sam¬ 
ple;  and  2)  a  well-defined  random  process  for  selecting  the  sample. 

The  assessments  reported  to  the  PAIS  database  do  not  satisfy  these  two  requirements.  The 
population  and  the  size  of  its  assessments  cannot  be  clearly  defined,  and  the  assessed  oigani- 
zations  are  not  selected  on  a  random  basis.  Rather,  the  assessments  in  PAIS  are  a  self-selected 
sample,  i.e.,  the  assessed  organizations  voluntarily  participated  in  CBAIPI  assessments  to 
improve  their  software  process  or  were  required  to  do  so  by  the  sponsors  of  their  work. 

Hence,  our  analyses  are  based  on  nonrandom  sampling  methods. 

In  nom-andom  design,  the  dataset  itself  in  the  PAIS  is  a  population  of  assessment  cases,  where 
the  population  is  called  a  local  population  or  a  set  of  available  cases  [Lunneborg  00].  In  ad¬ 
dition,  a  sample  implies  a  random  sample  from  that  local  population. 

Because  the  cases  are  not  a  random  sample,  statistical  inferences  to  a  population  beyond  the 
dataset  at  hand  are  not  possible.  But,  it  is  sensible  to  infer  the  descriptions  to  the  local  popu¬ 
lation.  The  descriptions  are  not  inferences  to  a  population.  Rather,  they  should  be  considered 
as  descriptive  statistics,  and  they  neither  can  be  generalized  to  others  nor  have  causal  implica¬ 
tions.  Typical  descriptions  include  measures  of  central  tendency  (e.g.,  means  or  medians), 
dispersion  (e.g.,  variance  or  control  limits),  or  relationship  (e.g.,  correlation  coefficients  or 
internal  consistency). 

Descriptions  based  on  a  nonrandom  sample  need  assurance  that  they  truly  characterize  the 
available  cases  and  that  they  are  stable  [Lunneborg  00,  Montgomery  et  al.  98].  An  available 
set  of  cases  such  as  our  assessment  dataset  cannot  be  assumed  to  have  the  same  degree  of 
homogeneity  as  a  random  sample.  A  fair  description  is  a  stable  one  that  is  relatively  uninflu¬ 
enced  by  the  presence  of  specific  cases.  Thus,  results  of  this  report  should  be  tested  for  their 
stability  (homogeneity). 
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3.3  Data  Analysis 

As  we  just  noted,  since  our  dataset  is  not  a  random  sample,  we  need  to  examine  the  stability 
of  its  estimated  internal  consistency.  For  this  purpose,  we  use  a  subsample  technique  to 
evaluate  the  stability.  We  recompute  internal  consistency  on  a  sequence  of  subsamples,  where 
each  subsample  contains  some  but  not  all  the  available  cases  [Lunneborg  00].  If  the  internal 
consistency  has  apparently  changed  due  to  leaving  out  some  cases,  the  reliability  is  not  sta¬ 
ble.  The  same  statistical  analyses  usually  can  be  used  in  both  random  and  nonrandom  cases, 
but  a  nonrandom  case  requires  an  assumption  of  stability. 


3.3.1  Generating  a  Set  of  Bootstrap  Subsamples 

How  one  forms  the  subsamples  depends  on  the  information  that  is  available  about  the  manner 
in  which  the  case  data  are  collected,  i.e.,  structured  and  unstructured  datasets.  In  a  structured 
dataset,  the  subsamples  are  formed  based  on  contextual  information  such  as  time  of  data  col¬ 
lection,  site,  or  investigator  (experimenter,  teacher,  etc).  In  contrast,  sufficient  contextual  in¬ 
formation  is  missing  in  an  unstructured  dataset.  The  structure  of  our  dataset  is  unclear,  except 
for  the  organizational  location  classifications  of  U.S.  and  non-U.S.,  but  the  usage  of  organiza¬ 
tional  location  as  a  structure  criterion  results  in  an  insufficient  number  of  observations  for 
estimating  the  alpha  value  at  maturity  level  5.  Fortunately,  however,  resampling  of  an  un¬ 
structured  dataset  also  has  the  advantage  of  providing  confidence  intervals  for  Cronbach’s 
alpha  values. 

The  stability  of  internal  consistency  in  an  unstructured  dataset  is  examined  using  a  set  of  sub¬ 
samples  that  are  generated  from  a  bootstrap'*  resampling  procedure  [Lunneborg  00].'*  The 
procedure  draws  a  sample  of  size  n  without  replacement,  where  n  is  the  number  of  observa¬ 
tions  in  the  original  dataset.  Then,  a  subsample  can  be  obtained  by  taking  a  half  of  the  sam¬ 
ple,  i.e.,  a  half-size  sample.  This  process  is  repeated  B  times,  where  fi  is  a  large  number  as 
high  as  1,000.  The  reason  for  choosing  half-samples  is  that  there  will  be  a  larger  number  of 
distinct  subsamples  of  that  size  than  of  any  other  size  [Lunneborg  00]  that  could  be  drawn 
from  the  full  sampler .  This  study  uses  an  S-Plus  bootstrap  routine  to  generate  its  half-size 
samples  [Mathsoft  99]. 


'*  This  bootstrap  method  should  not  be  confused  with  the  Bootstrap  model  for  process  assessment 
[Kuvaja  99]. 

'*  Evaluating  stability  in  a  structured  dataset  requires  computing  a  description  of  the  original  dataset 
and  then  recomputing  the  same  description  for  each  subsample  partitioned  by  time,  site,  or  other 
criteria.  If  the  internal  consistency  values  of  the  subsamples  are  close  to  each  other  and  to  the  de¬ 
scription  of  the  original  dataset,  then  the  description  is  considered  to  be  stable.  On  the  other  hand, 
if  the  description  of  a  particular  subsample  is  remarkably  different  from  those  of  the  other  sub¬ 
samples  and  the  original  data  set,  this  implies  that  the  description  of  the  original  dataset  depends 
heavily  on  the  cases  omitted  from  that  subsample. 
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A  bootstrap  method  has  been  successfully  used  previously  in  empirical  software  engineering. 
El-Emam  and  Garro  estimated  the  number  of  SPICE  assessments  by  utilizing  a  capture- 
recapture  method  [El-Emam  &  Garro  00].  Jung  and  Hunter  utilized  a  bootstrap  method  in 
computing  confidence  levels  for  the  capability  levels  for  each  ISO/IEC  15504  process  [Jung 
&  Hunter  01]. 


3.3.2  Examining  the  Stabiiity  of  the  Estimated  Results 

The  bootstrap  resampling  procedure  in  this  report  can  be  used  for  parameter  estimation  or 
confidence  intervals  on  the  mean  or  median  (difference).  If  1,000  subsamples  are  taken,  and 
an  internal  consistency  value  is  computed  for  each  of  the  1,000  subsamples,  then  the  lower 
and  upper  limits  of  the  confidence  interval  for  internal  consistency  can  be  determined  at  per¬ 
centiles  of  2.5%  and  97.5%  respectively.  The  histogram  of  1,000  replications  is  called  the 
empirical  reference  distribution.  The  confidence  interval  of  the  empirical  reference  distribu¬ 
tion  is  called  the  empirical  confidence  interval  (ECI). 


Internal  consistency  from  the  original  dataset  should  be  solidly  in  the  middle  of  the  empirical 
reference  distribution  in  order  to  be  considered  stable.  It  should  not  be  at  or  near  the  limits  of 
the  empirical  reference  distribution.  The  difference  between  a  value  of  internal  consistency  in 
the  original  dataset  and  the  mean  of  those  in  B  subsamples  is  called  bias.  For  defining  bias, 
define:  4  as  a  value  of  internal  consistency  at  the  Z?th  subsample,  where  b=l,  ...,  B;  B  is 

the  number  of  replications  (here,  1000);  ^  is  an  estimated  internal  consistency  from  the 
original  dataset.  Then  the  bias,  BIAS  ,  is  defined  as  follows: 

B 

BlAS=-^ - 9. 

B 

In  addition,  the  sample-to-sample  variability  of  the  estimated  internal  consistency  is  the  stan¬ 
dard  deviation  of  the  sampling  distribution  of  B  replicates.  This  is  called  the  standard  error 
(SE)  of  the  estimate  and  is  defined  as  follows: 


where  t  *  = 


The  degree  of  bias  is  evaluated  against  the  SE  of  the  sampling  distribution  of  B  replicates.  If 
the  bias  is  large  relative  to  the  SE,  there  is  a  problem  (unstable).  A  criterion  for  judgment  is 
that  if  the  absolute  value  of  the  bias  is  less  than  one-quarter  the  size  of  the  SE,  the  bias  can  be 
ignored  safely  [Efron  &  Tibshirani  93].  One  can  conclude  that  the  internal  consistency  from 
the  original  dataset  is  stable. 
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4  Results 


4.1  Descriptive  Summary  of  the  Dataset 

Our  dataset  is  based  on  the  676  CBA IPI  assessments  that  were  reported  to  the  SEI  during  the 
period  of  January  2000  through  April  2002.  Figure  1  shows  the  number  of  assessed  organiza¬ 
tions  by  site  type  in  both  the  U.S.  and  non-U.S.*’  Over  a  third  of  the  U.S.  organizations  are 
government  contractors,  but  almost  60  percent  produce  products  for  the  commercial  market 
or  for  their  own  use  in-house.  A  noticeable  minority  are  themselves  U.S.  government  and 
military  organizations.  While  over  10  percent  of  the  non-U.S.  organizations  are  contractors 
for  the  U.S.  government,  the  same  proportion  develop  or  maintain  software  for  their  own  in- 
house  use,  and  three-quarters  of  them  develop  or  maintain  software  for  commercial  sales. 


organizations 


Figure  1:  Types  of  Organization  Assessed 

The  dataset  includes  2,860  projects  from  676  CBA  IPI  assessments.  Figure  2  is  a  box  and 
whisker  plot  showing  the  variation  in  the  number  of  projects  in  the  assessments  (missing  for 
5  assessments).**  The  minimum  and  maximum  numbers  of  projects  accessed  in  one  assess- 


*’  Two  organizations  are  missing  in  each  region. 

'*  See  Appendix  C  for  an  explanation  of  box  and  whisker  plots. 
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ment  are  1  and  21,  respectively.  The  mean  and  median  projects  assessed  in  a  single  assess¬ 
ment  are  4.26  and  4,  respectively.  This  number  is  essentially  the  same  as  the  recommended  4 
projects  for  CBA IPI  assessments  [Dunaway  &  Baker  01]. 


Figure  2:  A  Box  and  Whisker  Plot  Showing  the  Variation  of  Projects  Assessed 

Figure  3  shows  the  number  of  Functional  Area  Representative  (FAR”)  group  interviews  in 
671  assessments.  The  mean  is  6.06  and  the  median  is  6.  The  maximum  number  of  FAR 
groups  interviewed  is  18. 


Figure  3:  Number  of  FAR  Groups  Interviewed  per  Assessment 


Functional  Area  Representatives  are  practitioners  who  have  technical  responsibilities  in  various 
areas  that  support  their  organizations’  software  development  or  maintenance  projects,  e.g.,  con¬ 
figuration  management  or  quality  assurance.  Selected  FAR  interviewees  should  be  a  representative 
sample  of  the  assessed  organization’s  technical  staff.  FAR  interviewees  should  be  practitioners, 
not  managers  or  staff.  No  two  individuals  who  have  a  reporting  relationship  to  each  other  should 
be  in  a  FAR  interview  session  together. 
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Figure  4  shows  the  total  number  of  functional  area  representatives  interviewed  in  the  same 
671  assessments.  The  mean  and  median  are  32.50  and  30  respectively.  In  one  case,  there  were 
127  interviewees. 


Figure  5  shows  the  average  number  of  FAR  group  members  per  interview  based  on  670  as¬ 
sessments.  The  mean  and  median  are  5.05  and  5.45,  respectively.  This  number  is  within  the 
recommended  range  of  “four  to  eight  participants”  [Dunaway  &  Masters  96].  However,  the 
number  of  organizations  within  the  recommended  range  is  392  organizations  (58.51%).  Ap¬ 
proximately  40%  are  out  of  the  recommended  range.  As  such,  further  investigation  is  re¬ 
quired. 
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0 


Figure  5:  Average  Number  of  FAR  Interviewees 

Figure  6  shows  the  distribution  of  the  maturity  level  of  the  676  organizations  assessed.  The 
most  frequent  level  is  2  (Repeatable)  with  43.64  %  (295)  of  the  organizations.  The  next  was 
level  3  (Defined).  The  number  of  level  4  organizations  was  slightly  higher  than  that  of  level 
5,  i.e.  8.88%  (60)  versus  7.69%  (52). 


Figure  6:  Distribution  of  Maturity  Levels 

It  is  hard  to  imagine  that  the  proportion  of  organizations  at  maturity  level  2  is  larger  than  that 
at  maturity  level  1  in  software  organizations  throughout  the  world.  A  study  by  Fayad  and 
Laitnen  indicates  that  most  development  organizations  are  at  maturity  level  1  [Fayad  &  Lait- 
nen  97]. 
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The  fact  that  the  dataset  is  not  a  random  sample  makes  it  impossible  to  interpret  the  distribu¬ 
tion.  There  is  no  way  to  determine  whether  organizations  assessed  according  to  the  SW- 
CMM  are  a  representative  (random  sample)  of  the  industry  at  large.  It  is  most  likely  that,  as 
early  adopters  of  new  technology,  and  specifically  as  organizations  interested  in  SPI,  these 
organizations  are  from  the  “high  end”  of  the  maturity  spectrum.  This  phenomenon  has  been 
detected  in  the  SPICE  Trials  as  well  [Rout  et  al.  98].  These  results  partially  support  the  as¬ 
sumption  of  nonrandom  sampling  in  this  study. 


4.2  Analysis  Results 

Analysis  results  are  presented  in  the  sequence  of  datasets  summarized  in  Table  1.  Cronbach’s 
alpha  as  a  measure  of  internal  consistency  is  computed  separately  for  more  than  one  dimen¬ 
sion.  Hence,  we  first  describe  the  dimensions  that  underlie  the  KPAs  in  the  SW-CMM. 

/ 

4.2.1  Cronbach’s  Alpha  in  Maturity  Level  2  KPAs 

As  seen  in  Table  1,  all  of  the  five  KPAs  at  maturity  level  2  except  Software  Subcontract 
Management  were  rated  in  676  organizations.  To  investigate  the  dimensionality,  factor  analy¬ 
sis  with  a  principal  component  analysis  was  performed  for  the  five  KPAs.  The  scree  plot  of 
the  five  KPAs  in  Figure  7  shows  a  break  after  the  first  component  (i.e.,  a  unidimensional 
characteristic  of  the  five  KPAs).  In  addition,  only  one  eigenvalue  of  4.18  satisfies  the  greater- 
than-one  rule.  The  scree  plot  and  eigenvalue-greater-than-one  rule  reach  the  same  conclusion. 
The  factor  loadings  in  Table  2  exceed  the  recommended  cut-off  value  of  0.6  [Sharma  96]  as 
well  as  an  “excellent”  criterion  of  0.71  [Comrey  73].  Thus,  we  conclude  that  the  five  ICPAs  at ' 
maturity  level  2  are  items  of  a  single  construct,  which  we  call  the  “Project  Implementation” 
dimension. 


Figure  7:  The  Scree  Plot  of  the  Five  KPAs  at  Maturity  Level  2 
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A  factor  loading  ( A, )  indicates  the  correlation  between  a  variable  (KPA  determination)  and 
the  underlying  factor.  The  eigenvalue  equals  the  sum  of  the  squared  factor  loadings  for  a  fac¬ 
tor  over  all  items,  i.e.,  =  0.87^  +  0.94^  +0.92^  +0.92^+0.92"  =4.18.  The  value  of 

83.59%,  in  the  last  row  of  Table  2,  denotes  the  proportion  of  variance  explained  by  the  com¬ 
mon  factor,  i.e.,  the  degree  of  factorial  determination  of  variables.  Its  computation  is  per¬ 
formed  by  the  eigenvalue  divided  by  the  number  of  items  (i.e.,  4.18/5=0.8359). 


Table  2:  Factor  Loadings  (  )  for  the  Five  KPAs  at  Maturity  Level  2 


KPA  name 

Factor  1 

(“Project  Implementation’’) 

Requirements  Management 

0.87 

Software  Project  Planning 

0.94 

Software  Project  Tracking  and  Oversight 

0.92 

Software  Quality  Assurance 

0.92 

Software  Configuration  Management 

0.92 

%  of  variance  explained 

83.59% 

The  (observed)  Cronbach’s  alpha  of  the  five  KPAs  is  0.9495  as  seen  in  Table  3.  This  value 
exceeds  a  standard  recommendation  of  0.9  [Fusaro  et  al.  98,  Nunnally  &  Bernstein  94]  and  is 
high  enough  to  use  in  practice. 


Table  3:  Cronbach’s  Alpha  and  Bootstrap  Results  in  KPAs  at  Maturity  Level  2 


Observed 

Cronbach’s 

alpha 

Bootstrap 

Cronbach’s 

alpha 

Bias 

SE 

95%  ECI 

Dimension  1 
(“Project  Imple¬ 
mentation”) 

0. 9495 

0.9499 

0.0004 

0.0077 

[0.934, 0.964] 

Figure  8  shows  the  distribution  of  the  bootstrap  mean  of  Cronbach’s  alpha  in  the  maturity 
level  2  KPAs.  The  dotted  and  solid  vertical  lines  in  Figure  8  denote  a  bootstrap  value  of 
20.9499  and  observed  value^°  of  0.9495,  respectively.  The  absolute  difference  0.0004  in  alpha 
values  between  observed  and  bootstrap  values  is  small  relative  to  the  SE  value  of  0.(K)77. 

This  satisfies  the  criterion  for  evaluating  stability  [Efron  &  Tibshirani  93].  The  ECI  of  95%  is 
[0.934, 0.964].  Thus,  we  can  conclude  that  the  resulting  alpha  value  in  KPAs  of  maturity 
level  2  is  fairly  stable.  The  lower  limit  of  the  Cl  is  greater  than  0.9  of  a  recommended  value. 


^  The  term  observed  implies  “without  bootstrap  resampling.” 
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0.92  0.93  0.94  0.95  0.96  0.97 

Internal-consistency  reliability 

Figure  8:  Cronbach’s  Alpha  in  Maturity  Level  2-3  KPAs 

As  seen  in  Table  1, 408  assessments  covered  12  of  the  13  KPAs  in  maturity  levels  2  and  3 
(i.e.,  5  KPAs  at  level  2  and  7  KPAs  at  level  3).  Of  those  organizations,  120  (29.41%)  were 
assessed  at  level  1  or  2.  The  remaining  288  organizations  were  assessed  at  maturity  level  3, 4, 
or  5. 


Component  Number 

Figure  9:  The  Scree  Plot  of  the  12  KPAs  at  Maturity  Levels  2-3 
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The  scree  plot  of  the  12  KPAs  in  Figure  9  shows  a  two-dimensional  structure,  where  the  ei¬ 
genvalues  greater  than  1  are  7.11  and  2.65,  respectively.  Their  factor  loadings  in  Table  4 
show  that  the  KPAs  at  maturity  levels  2  and  3  form  two  separate  dimensions.  This  suggests 
that  the  12  KPAs  are  in  fact  fairly  well  grouped  and  defined  in  the  alignment  postulated  by 
the  SW-CMM  maturity  level  definitions.  We  call  the  second  factor  “Organization  Implemen¬ 
tation,”  which  corresponds  to  the  KPAs  at  maturity  level  3. 

This  result  is  consistent  with  the  one  from  dataset  1,  which  covers  the  KPAs  at  maturity  level 
2  only.  The  proportion  of  variance  explained  by  the  two  factors  is  81.27%,  i.e.,  22.06%  + 
59.21%. 


Table  4:  Factor  Loadings  for  the  KPAs  at  Maturity  Levels  2-3 


KPAs  at  maturity  levels  2-3 


Factor  1  (“Project 
Implementation”) 


Factor  2 

(“Organizational 

Implementation”) 


Requirements  Management  (2) 


0.85 


0.17 


Software  Project  Planning  (2) 


0.92 


0.18 


Software  Project  Tracking  and  Oversight  (2) 


0.89 


0.19 


Software  Quality  Assurance  (2) 


0.86 


0.25 


Software  Configuration  Management  (2) 


0.88 


0.22 


Organization  Process  Focus  (3) 


0.32 


0.78 


Organization  Process  Definition  (3) 


0.18 


0.92 


Training  Program  (3) 


0.21 


0.88 


Integrated  Software  Management  (3) 


0.17 


0.92 


Software  Product  Engineering  (3) 


0.19 


0.89 


Intergroup  Coordination  (3) 


0.17 


0.85 


Peer  Review  (3) 


0.20 


0.87 


I  of  variance  explained 


22.06% 


59.21% 


*  The  number  in  parenthesis  of  KPA  denotes  maturity  level. 

Table  5  shows  observed  and  bootstrap  alpha  values  and  95%  confidence  intervals  for  each  of 
the  two  dimensions.  The  alpha  values  are  greater  than  the  recommended  value  of  0.9.  Com¬ 
parisons  of  the  bias  and  the  SE  values  show  stability  of  the  two  alpha  values.  The  bootstrap 
distribution  (not  shown)  also  corroborates  the  stability  of  the  alpha  values. 


Table  5:  Cronbach's  Alpha  and  Bootstrap  Results  in  KPAs  at  Maturity  Levels  2-3 


Observed 

Cronbach’s 

alpha 

Bootstrap 

Cronbach’s 

alpha 

Bias 

SE 

95%  ECI 

Dimension  1 
(“Project 
Implementation”) 

0.942 

0.939 

-0.003 

0.019 

[0.895, 0.969] 

Dimension  2 
(“Organization 
Implementation”) 

0.960 

0.959 

-0.001 

0.005 

[0.950, 0.968] 
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4.2.2  Cronbach’s  Alpha  in  Maturity  Level  2-4  KPAs 

As  shown  in  Table  1, 14  KPAs  at  maturity  levels  2  through  4  (i.e.,  5  KPAs  at  level  2,  7  KPAs 
at  level  3,  and  2  KPAs  at  level  4)  were  rated  in  156  organizations.  Among  them,  44  organiza¬ 
tions  did  not  attain  maturity  level  4.  The  remaining  112  organizations  were  determined  to  be 
at  maturity  level  4  or  5. 

The  scree  plot  of  the  14  KPAs  in  Figure  10  shows  a  break  after  the  first  three  components, 
with  eigenvalues  of  8.061,  3.657,  and  1.543,  respectively.  The  scree  plot  and  eigenvalue- 
greater-than-one  rule  suggests  three-dimensional  structure  of  the  maturity  concept  based  on 
these  KPAs.  Table  6  demonstrates  high  factor  loadings  for  each  of  the  three  dimensions, 
which  in  fact  correspond  to  the  structure  postulated  by  the  SW-CMM  for  these  KPAs.  We 
conclude  that  each  stage  of  maturity  levels  2  through  4  corresponds  to  a  separate  dimension 
of  the  maturity  concept.  We  call  the  third  dimension  “Quantitative  Process  Implementation.” 


Component  Number 

Figure  10:  The  Scree  Plot  of  the  14  KPAs  at  Maturity  Levels  2-4 

Cronbach’s  alpha  and  bootstrap  values  are  shown  in  Table  7.  The  alpha  values  for  each  of  the 
three  dimensions  are  1,  0.981,  and  0.968,  which  are  very  acceptable  levels.  The  values  of  bias 
and  SE  indicate  stability  of  the  three  alpha  values. 
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Table  6:  Factor  Loading  for  the  KPAs  at  Maturity  Levels  2-4 


KPAs  in  maturity  levels  2-4 

Factor  1 
CTroject 
Implementation”) 

Factor  2 

(“Organizational 

Implementation”) 

Factor  3 

(“Quantitative  Process 
Implementation”) 

Requirements  Management  (2) 

0.98 

0.18 

0.04 

Software  Project  Planning  (2) 

0.98 

0.18 

0.04 

Software  Project  Tracking  and  Oversight  (2) 

0.98 

0.18 

0.04 

Software  Quality  Assurance  (2) 

0.98 

0.18 

0.04 

Software  Configuration  Management  (2) 

0.18 

0.04 

Organization  Process  Focus  (3) 

0.20 

0.93 

0.12 

Organization  Process  Definition  (3) 

0.15 

0.94 

0.14 

Training  Program  (3) 

0.18 

ihbbeiihiihi 

0.15 

Integrated  Software  Management  (3) 

0.14 

0.18 

Software  Product  Engineering  (3) 

0.16 

0.92 

0.15 

Intergroup  Coordination  (3) 

0.15 

0.94 

0.14 

Peer  Review  (3) 

0.20 

0.93 

0.12 

Quantitative  Process  Management  (4) 

0.04 

0.25 

0.95 

Software  Quality  Management  (4) 

0.05 

0.22 

0.96 

%  of  variance  explained 

26.12% 

57.58% 

11.02% 

Table  7:  Cronbach’s  Alpha  and  Bootstrap  Results  in  KPAs  at  Maturity  Levels  2-4 


Observed 

Cronbach’s 

alpha 

Bootstrap 

Cronbach’s 

alpha 

Bias 

SE 

95%  ECI 

Dimension  1 
(“Project 
Implementation”) 

1 

1 

- 

- 

- 

Dimension  2 
(“Organization 
Implementation”) 

0.981 

0.977 

-0.004 

0.039 

[0.924,1] 

Dimension  3 
(“Quantitative 
Process 

Implementation”) 

0.968 

0.967 

-0.001 

0.017 

[0.911,1] 

4.2.3  Cronbach’s  Alpha  in  Maturity  Level  2-5  KPAs 

A  total  of  78  organizations  were  assessed  through  maturity  level  5.  Among  them,  54  organi¬ 
zations  were  determined  to  be  at  level  5.  The  scree  plot  of  the  17  KPAs  in  Figure  11  shows  a 
three-dimensional  structure  of  the  maturity  concept,  with  eigenvalues  greater  than  1  of  9.94, 
4.07,  and  2.31,  respectively.  The  factor  loadings  in  Table  8  show  that  the  first  two  dimensions 
remain  the  same  as  in  our  previous  results;  however,  the  KPAs  at  maturity  levels  4  and  5 
form  a  single  dimension.  We  have  called  the  third  dimension  “Quantitative  Process  Imple¬ 
mentation.”  At  least  in  this  dataset,  the  KPA  profiles  across  maturity  levels  4  and  5  are  in  fact 
closely  interrelated. 
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Component  Number 

Figure  11:  The  Scree  Plot  of  the  1 7  KPAs  at  Maturity  Levels  2-5 


Table  8:  Factor  Loading  for  KPAs  at  Maturity  Levels  2-5 


Factor  1 
CTroject 
Implementation”) 


Factor  2 

(^^Organizational 

Implementation”) 


Factor  3 

(^‘Quantitative  Process 
Implementation”) 


Requirements  Management  (2) 


Software  Project  Planning  (2) 


Software  Project  Tracking  and  Oversight  (2) 


Software  Quality  Assurance  (2)  _ 


Software  Configuration  Management  (2) 


Organization  Process  Focus  (3) 


Organization  Process  Definition  (3) 


Training  Program  (3) 


Integrated  Software  Management  (3) 


Software  Product  Engineering  (3)  _ 


Intergroup  Coordination  (3) 


Peer  Review  (3) _  . 


Quantitative  Process  Management  (4) 


Software  Quality  Management  (4) _ 


Defect  Prevention  (5) 


Technology  Change  Management  (5) 


Process  Change  Management  (5) 


%  of  variance  explained 


23.93% 


58.44% 


13.59% 


Once  again,  the  Cronbach’s  alpha  values  of  1, 0.995  and  0.970  are  at  very  acceptable  levels. 
As  shown  in  Table  9,  bootstrap  results  verify  the  stability  of  the  three  alpha  values. 
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Table  9:  Cronbach’s  Alpha  and  Bootstrap  Results  in  KPAs  at  Maturity  Levels  2-5 


Observed 


Cronbach’s 

alpha 


95%  ECI 


Dimension  1 
(“Project  Imple- 
mentation**) 
Dimension  2 
(“Organization 
Implementation”) 


Dimension  3 
(“Quantitative 
Process  Implemen- 
tation”) 


0.  995 


0.  970 


0.011  [0.947,0.989] 
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5  Discussion  and  Conciusions 


5.1  Interpreting  the  Results 

First  and  foremost,  our  results  provide  confidence  in  the  internal  consistency  of  the  KPAs  of 
the  SW-CMM.  The  consistently  high  values  of  Cronbach’s  alpha  show  that  CBA IPI  assess¬ 
ment  teams  typically  do  in  fact  make  rating  judgments  that  are  internally  consistent  with  the 
structure  of  the  SW-CMM. 


Our  factor  analytic  results  also  lend  credibility  to  the  structure  of  the  SW-CMM  itself.  As  the 
model  posits,  there  is  independent  empirical  evidence  that  affinity  groupings  of  KPAs  should 
be  considered  as  separate  dimensions  of  the  capability  maturity  construct.  Moreover,  the 
“Project  Implementation”  and  “Organization  Implementation”  dimensions  are  entirely  con¬ 
gruent  with  the  KPAs  at  maturity  levels  2  and  3,  respectively. 


More  does  need  to  be  learned,  however,  about  the  distinctions  between  levels  4  and  5.  The 
maturity  level  4  and  5  KPAs  form  a  single  dimension  that  we  have  termed  “Quantitative 
Process  Implementation.”  The  KPA  profiles  across  maturity  levels  4  and  5  are  in  fact  closely 
interrelated  in  the  dataset  we  analyzed.  The  KPAs  at  maturity  levels  4  and  5  all  necessitate 
statistical  thinking  and  the  careful  use  of  quantitative  analytic  methods,  so  it  does  make  sense 
that  their  satisfaction  profiles  are  closely  interrelated.  Still,  our  present  results  are  based  on  a 
relatively  small  number  of  high  maturity  organizations,  and  one  certainly  could  argue  that  it 
remains  best  practice  to  master  the  level  4  KPAs  first.  Clearly,  more  and  better  evidence  is 
needed  to  provide  a  fuller  understanding  of  the  nuances  in  the  results,  along  with  opportuni¬ 
ties  for  improvement  in  the  model  and  appraisal  methods. 

Recall  from  our  literature  review  in  Section  2  that  the  concept  of  ISO/IEC  15504  capability 
consists  of  a  two  dimensional  structure  [El-Emam  98,  Jung  &  Hunter  02].  Our  own  results 
support  the  validity  of  separate  dimensions  corresponding  to  maturity  levels  2  and  3  respec¬ 
tively.  The  existence  of  an  additional  dimension  may  reflect  the  difference  between  organiza¬ 
tional  maturity  as  characterized  by  the  SW-CMM  and  process  capability  as  characterized  by 
ISO/IEC  15504.  Note  too  that  both  the  15504  studies  and  our  own  suggest  the  existence  a 
similar  “Quantitative  Process  Implementation”  construct.  Such  similarities  based  on  disparate 
methods  and  data  provide  additional  confidence  about  both  sets  of  results. 
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Regardless,  the  dimensionality  that  we  found  holds  consistently  for  four  datasets,  and  internal 
consistency  as  estimated  by  Cronbach’s  alpha  is  consistently  high  for  each  of  the  three  di¬ 
mensions.  Although  the  KPAs  at  levels  4  and  5  map  to  the  same  dimension  in  our  current 
analysis,  we  can  safely  conclude  that  each  stage  of  the  SW-CMM  corresponds  to  a  dimension 
of  organizational  maturity  and  that  the  KPAs  in  each  maturity  level  are  good  items  for  meas¬ 
uring  the  maturity  of  a  common  underlying  construct. 


5.2  Methodological  Issues 

5.2.1  Rating  Scales 

It  is  well  known  that  the  choice  of  rating  scale  affects  estimates  of  internal  consistency.  If  too 
few  categories  are  used,  the  rating  scale  does  not  capture  the  full  discriminatory  power  of  the 
measures.  On  the  other  hand,  using  too  many  categories  may  be  beyond  the  limited  discrimi¬ 
natory  powers  of  assessors. 

In  a  reliability  study  of  attitude  scales  with  3, 5,  and  7  categories,  Likert  and  Roslow  con¬ 
cluded  that  the  five-point  scales  consistently  produced  higher  reliability  estimates  than  did 
the  others.^'  Similarly,  a  Monte  Carlo  study  of  the  effects  on  reliability  of  the  number  of  scale 
points  showed  that  reliability  estimates  increased  as  the  number  of  scale  points  increased 
from  two  to  five,  but  the  estimates  decreased  as  more  categories  were  added  [Lissitz  &  Green 
75].  Another  similar  study  reached  the  same  conclusions  [Van  de  Ven  &  Ferry  80].  Finally, 
Jung  and  Hunter  showed  that  the  current  four-point  scale  to  rate  ISO/IEC  15504  process  at¬ 
tributes  cannot  be  improved  in  terms  of  internal  consistency  by  reducing  it  to  3  or  2  catego¬ 
ries  [Jung  &  Hunter  02]. 

Assessments  based  on  the  SW-CMM  typically  have  used  a  two-point  scale  of  “Fully  Satis¬ 
fied”  and  “Not  Satisfied”  to  measure  the  extent  of  achievement  of  a  KPA  goal.  Krishnan  and 
Kellner  addressed  the  difficulty  in  using  a  two-category  scale  when  there  is  inconsistency  in 
the  extent  to  which  SW-CMM  activities  and  practices  are  implemented  and  institutionalized 
[Krishnan  &  Kellner  99].  Based  on  an  earlier  proposal,  they  used  a  five-point  scale,  since  two 
categories  cannot  measure  partial  achievement.^  The  Standard  CMMI  Appraisal  Method  for 
Process  Improvement  (SCAMPI^*^)  now  uses  a  very  similar  four-point  scale  to  characterize 
practice  implementation  [SEI 01].  We  next  propose  to  replicate  and  extend  this  study  for 
CMMI-based  appraisals. 


Likert,  R.  &  Roslow,  S.  The  Effects  upon  the  Reliability  of  Attitude  Scales  of  Using  Three,  Five, 
Seven  Alternatives.  Working  paper.  New  York  University,  1934. 

This  is  taken  from  a  1994  presentation  by  Dennis  Goldenson  titled  “A  Multiple  Response  Scale 
for  Process  Measurement.” 

SCAMPI  is  a  service  mark  of  Carnegie  Mellon  University. 
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5.2.2  Explaining  the  High  Internal  Consistency 

How  can  we  explain  the  high  Cronbach’s  alpha  value  over  0.9  for  each  of  the  three  dimen¬ 
sions  of  capability  maturity?  The  likeliest  explanation  is  that  our  data  come  from  well-trained 
assessors  using  a  standardized  assessment  procedure  (CBA IPI)  against  a  well  understood 
reference  model  (the  SW-CMM).  The  importance  of  both  training  and  a  standardized  method 
is  described  elsewhere  [El-Emam  &  Madhavji  95,  Trochim  01]. 

Though  it  is  difficult  to  directly  compare  the  Cronbach’s  alpha  values,  those  reported  here  are 
consistently  higher  than  those  found  in  similar  previous  studies.  It  seems  natural,  though,  that 
data  drawn  from  actual  assessments  would  be  more  internally  consistent  than  those  from  a 
Delphi  panel,  questionnaires,  or  a  mail  survey.  A  full-scale  assessment  is  based  on  several 
sources  of  objective  evidence,  which  gives  the  appraisal  team  the  opportunity  to  clarify  any 
apparent  ambiguities. 


5.3  Future  Research 

Although  the  SETs  PAIS  database  retains  the  largest  number  of  assessment  cases  available 
anywhere,  the  dataset  is  not  a  random  sample,  and  our  results  cannot  be  generalized  to  all 
SW-CMM  assessments  conducted  around  the  world.  This  sometimes  is  referred  to  as  a  threat 
to  external  validity,  due  to  the  type  of  sampling  [Trochim  01].  Hence  interpretation  of  our 
results  should  rightly  be  limited  to  CBA  IPI  assessments  reported  to  PAIS  by  the  current  base 
of  CMM  users.  However,  if  any  other  SW-CMM  assessments  also  satisfy  the  same  require¬ 
ments  specified  by  CBA  IPI,  then  it  can  be  expected  that  they  too  will  be  internally  consistent 
with  our  results. 


No  single  study  can  be  fully  definitive.  Similar  studies  must  be  conducted  that  include  as¬ 
sessment  results  that  may  not  be  well  represented  currently  in  the  PAIS  database.  Such  stud¬ 
ies  should  include  sample  surveys  as  well  as  results  from  mini  assessments  and  similar 
“lighter  weight”  appraisals  conducted  on  organizations  that  are  not  yet  ready  to  invest  in  a 
full,  comprehensive  appraisal. 

A  generalization  of  internal  consistency  is  not  easy  work.  “Even  if  a  high  level  of  internal 
consistency  is  achieved  in  the  initial  item  analysis,  it  is  a  good  idea  to  replicate  it  in  subse¬ 
quent  samples.  Availability  of  reliability  estimates  across  different  types  of  samples  will  ex¬ 
pand  the  generalization  of  the  scales’  reliability....  There  should  be  little  variation  in  the 
magnitude  of  the  internal  consistency  from  sample  to  sample”  [Spector  92,  p.  65].  Changes 
over  time  and  perturbations  in  the  ways  SW-CMM  assessments  are  conducted  around  the 
world  require  careful  monitoring  of  internal  consistency.  Replications  are  necessary  to  raise 
the  confidence  in  any  findings. 
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Additional  empirical  work  also  is  necessary  to  help  us  better  understand  the  underlying  di¬ 
mensional  structure  of  capability  maturity.  This  study  will  be  replicated  as  soon  as  sufficient 
data  have  been  reported  for  CMMI  appraisals.  Possibly  using  data  from  other  sources,  pat¬ 
terns  of  anomalies  in  the  satisfaction  of  goals  out  of  the  order  prescribed  by  CMM  models 
should  also  be  more  closely  examined. 

The  high  internal  consistency  demonstrated  among  related  KPAs  suggests  another  question 
worthy  of  further  study.  Namely,  is  it  possible  to  achieve  comparable  levels  of  reliability  by 
sampling  from  within  each  dimension?  Of  course,  organizations  that  are  concerned  about  a 
particular  KPA  must  consider  the  evidence  in  full  detail.  Is  it  possible,  though,  to  make  suffi¬ 
ciently  confident  statements  about  maturity  levels  for  some  purposes  by  examining  evidence 
from  a  subset  of  model  practices? 
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Appendix  A  A  Brief  Review  of  the  CMM 

for  Software  and  Appraisai 
Methods 


The  SW-CMM 

The  SW-CMM,  on  which  process  assessments  that  we  have  studied  are  based,  codifies  what 
many  experts  believe  to  be  best  practices  of  software  engineering.  The  SW-CMM  was  the 
first  in  what  has  become  a  family  of  Capability  Maturity  Models.  These  include  most  notably 
the  Software  Acquisition  Capability  Maturity  Model  (SA-CMM)  [Cooper  &  Fisher  02],  the 
Systems  Engineering  Capability  Maturity  Model  (SE-CMM)  [Bate  et  al.  95],  the  People  Ca¬ 
pability  Maturity  Model  (P-CMM)  [Curtis  et  al.  01],  and  the  Integrated  Product  Development 
Cjapability  Maturity  Model  (IPD-CMM).  Much  of  the  content  of  these  models  now  is  being 
superseded  by  the  suite  of  CMM  Integration  (CMMI)  models  [SEI 02]. 


Early  versions  of  Humphrey’s  maturity  framework  appear  in  technical  reports  [Humphrey  87, 
Humphrey  &  Sweet  87],  in  papers  [Humphrey  88],  and  in  his  book.  Managing  the  Software 
Process  [Humphrey  89].  A  preliminary  maturity  questionnaire  [Humphrey  &  Sweet  87]  was 
released  in  1987  as  a  tool  to  provide  organizations  with  a  way  to  characterize  the  maturity  of 
their  software  processes.  The  questionnaire  was  not  recommended  as  a  stand-alone  tool  for 
estimating  the  maturity  of  an  organization.  Rather,  it  was  meant  as  an  orientation  tool  to  be 
used  during  an  assessment  [Olson  et  al.  1989].  All  of  these  efforts  were  initiated  in  response 
to  a  request  to  provide  the  federal  government  with  a  method  for  assessing  the  capability  of 
its  software  contractors. 


After  four  years  of  experience  with  the  software  process  maturity  framework  and  the  prelimi¬ 
nary  version  of  the  maturity  questionnaire,  the  SEI  and  its  affiliates  evolved  the  software 
process  maturity  fi-amework  into  a  fully  defined  model:  SW-CMM  Version  1.0.  Two  technical 
reports.  Capability  Maturity  Model  for  Software  [Paulk  et  al.  91]  and  Key  Practices  for  the 
Capability  Maturity  Model  for  Software  [Weber  et  al.  91]  formalized  the  description  of  the 
maturity  levels  in  terms  of  KPAs.  With  the  result  of  feedback  from  the  software  community, 
Version  1.0  was  evolved  to  Version  1.1  as  two  technical  reports.  Capability  Maturity  Model 
for  Software,  Version  1.1  [Paulk  et  al.  93a]  and  Key  Practices  for  the  Capability  Maturity 
Model  for  Software  [Paulk  et  al.  93b].  A  detailed  history  can  be  found  in  the  SEI  publications 
[Paulk  et  al.  94,  Paulk  95]. 
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The  SW-CMM  provides  a  framework  for  organizing  software  processes  into  five  evolution¬ 
ary  steps,  or  maturity  levels,  which  lay  successive  foundations  for  continuous  process  im¬ 
provement  (see  Table  10  [Paulk  99]).  The  SW-CMM  covers  practices  for  planning,  engineer¬ 
ing,  and  managing  software  development  and  maintenance.  When  followed,  these  key 
practices  are  meant  to  improve  the  ability  of  organizations  to  meet  goals  for  cost,  schedule, 
functionality,  product  quality,  and  other  performance  objectives. 


Table  10:  Maturity  Levels  and  their  Key  Process  Areas 


Level 

Focus 

Key  Process  Areas 

Level  5 
Optimizing 

Continuous  process  improvement 

-  Defect  Prevention 

-  Technology  Change  Management 

-  Process  Change  Management 

Level  4 

Managed 

Product  and  process  quality 

-  Quantitative  Process  Management 

-  Software  Quality  Management 

Level  3 

Defined 

Engineering  processes  and 
organizational  support 

-  Organization  Process  Focus 

-  Organization  Process  Definition 

-  Training  Program 

-  Integrated  Software  Management 

-  Software  Product  Engineering 

-  Intergroup  Coordination 

-  Peer  Review 

Level  2 

Repeatable 

Project  management  processes 

-  Requirements  Management 

-  Software  Project  Planning 

-  Software  Project  Tracking  and  Oversight 

-  Software  Subcontract  Management 

-  Software  Quality  Assurance 

-  Software  Configuration  Management 

Level  1 
Initial 

Competent  people  (and  heroics) 

Table  10  and  Figure  12  show  the  structure  of  the  Capability  Maturity  Model.  As  shown  in 
Table  10,  with  the  exception  of  Level  1,  each  maturity  level  is  composed  of  several  KPAs  that 
indicate  the  areas  on  which  an  organization  should  focus  to  improve  its  software  process. 
Each  KPA  is  organized  into  five  common  features,  including  Commitment  to  Perform,  Abil¬ 
ity  to  Perform,  Activities  Performed,  Measurement  and  Analysis,  and  Verifying  Implementa¬ 
tion.  These  attributes  serve  to  indicate  whether  the  implementation  and  institutionalization  of 
a  KPA  is  likely  to  be  effective,  repeatable,  and  lasting.  The  common  features  specify  the  key 
practices  that,  when  collectively  addressed,  are  meant  to  accomplish  the  goals  of  the  KPA. 
The  key  practices  describe  activities  and  infrastructure  that  are  believed  to  contribute  most  to 
the  effective  implementation  and  institutionalization  of  the  KPA.  The  SW-CMM  maturity 
levels  denote  stages^^  along  an  evolutionary  path  of  software  process  improvement.  Thus,  the 


Software  process  assessment  models  in  which  only  certain  processes  are  assessed  at  each  (matur¬ 
ity)  level  are  referred  to  as  staged  models  whereas  software  process  assessment  models  in  which 
each  process  may  be  assessed  at  each  (capability)  level  are  referred  to  as  continuous  models.  Thus 
the  SW-CMM  is  a  staged  model  and  ISO/IEC  15504  is  a  framework  for  continuous  models.  How¬ 
ever,  some  processes  in  ISO/IEC  15504  are  strongly  related  to  particular  capability  levels,  so  the 
distinction  between  continuous  and  staged  models  may  not  be  as  clear  as  it  may  at  first  appear. 
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maturity  level  is  an  ordinal  scale  for  measuring  the  maturity  of  an  organization’s  software 
process  and  for  evaluating  its  software  process  capability. 


Figure  12:  The  Structure  of  the  Capability  Maturity  Model 

Appraisal  Methods 

An  appraisal  method  describes  the  activities  and  procedures  that  need  to  be  conducted  during 
an  assessment.  It  also  includes  the  identification  of  the  assessment  sponsor,  the  purpose  and 
scope  of  the  assessment,  any  relevant  constraints,  assessment  responsibilities,  and  so  on. 

Several  appraisal  methods  have  been  developed  since  the  first  SEI-assisted  assessment  was 
conducted.  A  method  for  process  assessment,  called  Software  Process  Assessment  (SPA),  was 
commercialized  to  industry  and  government  licensees  in  1990.  SW-CMM  Version  1.0  and 
Version  1.1  were  published  in  1991  and  1993,  respectively.  Since  the  SPA  preceded  the  SW- 
CMM  publication,  the  SEI  developed  the  CBA IPI  method  for  assessing  an  organization’s 
software  process  capability.  CBA  IPI  Version  1.0  was  released  in  1995  and  updated  to  CBA 
IPI  Version  1.1  in  1996  [Dunaway  96,  Dunaway  &  Masters  96].  CBA  IPI  officially  replaced 
SPA  in  1995  through  expiration  of  all  SPA  licenses  effective  on  31  December  1995.  In  2002, 
the  SEI  released  Version  1.2  [Dunaway  &  Masters  01]. 


The  similarities  and  differences  between  the  two  models  are  described  by  Paulk  [Paulk  99]. 
CMMI  models  each  have  two  representations,  both  continuous  and  staged. 
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After  an  assessment  that  is  done  for  purposes  of  internal  process  improvement,  the  senior 
manager  of  the  assessed  organization  retains  the  assessment  findings  and  results,  and  gener¬ 
ally  uses  them  to  formulate  an  action  plan  for  the  process  improvement  program.  Analysis  of 
assessment  results  in  light  of  an  organization’s  business  needs  can  identify  the  strengths, 
weaknesses,  and  risks  inherent  in  that  organization’s  processes.  This,  in  turn,  often  leads  to  an 
improvement  initiative  that  aims  to  make  the  processes  more  effective  in  achieving  their 
goals,  and  to  prevent  significant  causes  of  poor  quality,  cost  overruns,  or  schedule  delay. 


CBA IPI  Version  1.1  specifies  minimum  requirements  for  the  composition  of  an  assessment 
team,  an  assessment  plan,  data  collection,  data  validation,  rating,  and  reporting  the  results. 
CBA  IPI  also  defines  three  phases  of  an  assessment.  The  first  phase  includes  the  activities 
necessary  to  plan  and  prepare  for  the  assessment  (see  Figure  13).The  second  phase  consists  of 
on-site  activities  for  conducting  the  assessment,  including  techniques  for  gathering,  organiz¬ 
ing.  and  consolidating  data  (see  Figure  14).  The  final  phase  is  to  report  the  results.  Each 
phase  is  described  in  CBA  IPI  Version  1.1  in  detail. 


Figure  13:  Pre-Onsite  Activities 
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Figure  1 4:  Chronology  of  On-Site  Activities 

Software  Capability  Evaluation  (SCE)  is  an  appraisal  method  that  is  meant  for  use  in  source 
selection  and  contract  monitoring  [Byrnes  &  Phillips  96].  SCE  offers  a  means  to  help  acquisi¬ 
tion  managers  to  identify  program  risk  by  evaluating  software  process  capability  in  source 
selection  and  manage  program  risk  by  motivating  contractors  to  improve  their  software  de¬ 
velopment  processes  without  forcing  compliance  to  specific  practices.  Sometimes,  SCEs  are 
used  to  guide  internal  process  improvement. 


In  1995,  the  SEI  published  the  CMM  Appraisal  Framework  (CAF)  [Masters  &  Bothwell  95]. 
The  CAF  describes  the  common  requirements  used  for  developing  appraisal  methods  based 
on  SW-CMM  Version  1.1  and  also  provides  a  basis  for  comparing  assessment  and  evaluation 
results.  Both  the  CBA IPI  and  SCE  methods  are  CAF-compliant. 

Maturity  Level  Determination 

A  maturity  level  is  defined  in  terms  of  satisfaction  of  the  goals  of  the  KPAs  within  its  level. 
Maturity  levels  2-5  each  have  several  KPAs  associated  with  them,  as  shown  in  Table  10.  Each 
KPA  is  rated  as  either  “Fully  Satisfied”  or  “Not  Satisfied,”  and  the  maturity  level  is  deter¬ 
mined  by  aggregating  the  KPA  ratings. 


The  rating  scheme  in  CMM  assessments  allows  the  aggregation  of  judgments  across  several 
projects  to  produce  a  single  maturity  level  for  the  entire  organization  within  the  scope  of  the 
assessment.  An  organization  is  defined  to  be  at  maturity  level  k  if  all  KPAs  up  to  and  includ¬ 
ing  maturity  level  k  are  rated  as  “Fully  Satisfied.”  For  example,  to  determine  whether  an 
organization  has  achieved  maturity  level  2  or  not,  it  is  necessary  to  determine  the  ratings 
achieved  by  the  six  KPAs  defined  at  maturity  level  2.  An  organization  that  fails  to  achieve 
maturity  level  2  is  determined  to  be  at  maturity  level  1. 
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Appendix  B  Reliability  in  Measuring 

Capability  Maturity^'* 


Reliability  estimation  in  measurement  has  been  based  on  the  classical  test  (score)  theory 
[Carmines  &  Zeller  79].  The  theory  is  represented  by  an  observed  rating  X  with  two  additive 
components  of  a  true  rating  T  and  an  error  term,  i.e.,  X  =  T  +  e,  which  is  a  special  case  in 
which  there  is  no  systematic  error.  Since  the  measurement  errors  e  are  random,  the  observed 
ratings  are  sometimes  higher  or  lower  than  the  true  rating.  Therefore,  in  the  long  run,  the 
mean  of  the  error  terms  becomes  zero.  The  reliability  of  measurement  is  defined  as  the  ratio 
of  true  to  observed  variance.  If  no  random  error  is  involved  in  the  measurement  (i.e.,  the 
variance  of  the  random  term;  var(e)  =  0 ),  then  the  reliability  equals  1.  If  the  observed  vari¬ 
ance  is  equal  to  random  variance  (i.e.  var(r)  =  0 ),  it  implies  zero  reliability  in  the  measure¬ 
ment.  Since  the  variance  of  the  true  rating,  var(7) ,  cannot  be  measured,  it  is  estimated. 


Reliability  estimation  methods  such  as  test-retest,  alternative-form,  split-half,  and  internal 
consistency  (Cronbach’s  alpha)  can  be  categorized  into  stability  (repeatability)  and  equiva¬ 
lence  (consistency)  by  the  basic  strategies  used  to  evaluate  reliability  [Zeller  &  Carmines  80]. 

Stability  (repeatability)  implies  that  repeated  assessments  of  the  same  process,  by  the  same 
assessor  using  the  same  or  an  alternative  instrument,  at  two  different  points  in  time,  should 
produce  results  that  can  be  accepted  as  being  identical  [Zeller  &  Carmines  80].  It  is  estimated 
by  utilizing  the  test-retest  method  or  the  alternative-form  method.  Consistency  (equivalence) 
in  a  measurement  focuses  on  multiple  items  of  a  concept  measured  at  a  single  point  in  time, 
where  each  item  is  considered  a  separate  but  equivalent  measure  of  the  underlying  concept. 
The  split-half  and  internal  consistency  methods  are  used  to  measure  consistency. 

The  first  two  methods  (test-retest  and  alternative-form)  for  repeatability  have  major  limita¬ 
tions  in  process  assessments,  such  as  requiring  two  consequent  assessments  by  the  same  as¬ 
sessors  with  the  same  instrument  (in  test-retest)  or  two  different  instruments  (in  alternative- 
form).  The  split-half  method  has  difficulties  in  dividing  items  and  can  be  considered  as  a  spe¬ 
cific  case  of  Cronbach’s  alpha.  The  four  methods  are  explained  in  the  context  of  software 
process  assessments  in  detail  elsewhere  [El-Emam  &  Goldenson  00].  El-Emam  &  Goldenson 
consider  internal  consistency  to  be  the  most  appropriate  method  for  measuring  reliability  in 
software  process  assessments. 


This  section  is  partially  based  on  materials  from  Jung  [Jung  2002a]. 
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Since  there  is  more  than  one  method  for  estimating  reliability,  a  method  must  be  selected. 

The  selection  should  consider  the  advantages  and  disadvantages  of  each  method  in  a  specific 
application  field,  as  well  as  perceptions  in  the  research  community  about  what  are  appropri¬ 
ate  methods  of  reliability  estimation.  Research  communities  related  to  software  processes  are 
management  information  systems  (MIS)  and  software  engineering.  In  both  communities,  in¬ 
ternal  consistency  is  the  most  popular  method  for  estimating  reliability  of  measurement. 

Cronbach’s  alpha  has  been  used  for  many  years  to  estimate  the  reliability  of  MIS  measure¬ 
ment  instruments.  Examples  include  software  processes  and  their  outcomes  [Subramanian  & 
Nilakanta  94],  systems  effectiveness  [Srinivasan  85],  user  information  satisfaction  [Ives  et  al. 
83,  Tait  &  Vessey  88],  user  involvement  [Amoako-Gyampah  &  White  93,  Baroudi  et  al.  86], 
perceived  ease  of  use  and  usefulness  of  software  [Adams  et  al.  92,  Davis  89],  information 
system  service  quality  [Jiang  et  al.  00,  Van  Dyke  et  al.  99],  evaluation  of  information  systems 
[Goodhue  98],  organizational  benefits  of  IS  projects  [Mirani  &  Lederer  98],  and  satisfaction 
with  high-speed  networks  [Eum  et  al.  2001].  Sethi  and  King  note  that  Cronbach’s  alpha  is  the 
most  important  method  for  estimating  reliability  of  instruments  [Sethi  &  King  91].  Test- 
retests  have  been  used  to  measure  reliability  of  a  user  information  satisfaction  instrument 
[Galletta  &  Lederer  89]  and  of  a  user  involvement  instrument  [Torkzadeh  &  Doll  94]. 

Internal  consistency  is  the  most  popular  method  to  estimate  reliability  in  software  engineer¬ 
ing  as  well.  For  example,  Cronbach’s  alpha  was  used  for  estimating  the  internal  consistency 
of  the  1987  maturity  questionnaire  and  the  ISO/IEC  15504  capability  dimension  [Fusaro  et 
al.  98],  of  an  organizational  maturity  instrument  [El-Emam  &  Madhavji  95],  of  key  success 
factors  in  SPI  [Dyba  00,  El-Emam  et  al.  01],  and  of  the  ISO/IEC  PDTR  15504  capability  di¬ 
mension  [El-Emam  98,  Jung  &  Hunter  02]. 

Interrater  agreement  also  has  been  used  to  estimate  the  reliability  of  software  process  ap¬ 
praisals.  Estimates  are  made  of  the  extent  to  which  two  assessors  or  teams  of  assessors  agree 
when  making  independent  judgments  about  the  same  software  engineering  processes.  Internal 
consistency  and  interrater  agreement  are  sometimes  called  internal  reliability  and  external 
reliability,  respectively  [Fusaro  et  al.  98]. 

In  practice,  the  subjective  nature  of  ratings  makes  it  unlikely  that  there  will  be  perfect  inter¬ 
rater  agreement;  however,  a  series  of  such  studies  conducted  as  part  of  the  international 
SPICE  trials  does  show  reasonably  high  levels  of  interrater  agreement  [El-Emam  99,  El- 
Emam  &  Goldenson  00,  Jung,  et  al.  01].  More  recent  work  by  Jung  provides  further  discus¬ 
sion  about  paradoxes  in  the  interpretation  of  the  Kappa  coefficient  that  is  used  in  the  SPICE 
studies. “ 


“  Described  in  an  article  by  Ho- Won  Jung,  titled  “Evaluating  the  Interrater  Agreement  in  SPICE- 
based  Software  Process  Assessments,”  which  was  under  review  at  the  time  of  this  publication. 
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Additionally,  results  from  a  CMMI  pilot  appraisal  show  remarkably  similar  results  from  two 
parallel  teams  that  independently  prepared  observations,  findings,  and  goal  ratings  for  the 
entire  scope  of  a  maturity  level  5  appraisal.^^  For  example,  only  5  of  79  goal  ratings  differed, 
all  of  which  were  due  to  difficulty  interpreting  new  model  content.  These  studies  and  other 
studies  also  have  begun  to  analyze  the  (validity)  factors  that  appear  to  affect  differences  in 
interrater  reliability. 


“  This  material  is  taken  from  a  presentation  by  Dennis  Goldenson,  titled  “But  Can  I  Trust  the  Ap¬ 
praisal  Results?  Existing  Evidence  and  What’s  Coming  Next,”  delivered  to  the  National  Research 
Council,  Canada,  in  October  2002. 


CMU/SEI-2002-TR-037 


41 


42 


CMU/SEI-2002-TR-037 


Appendix  C  Explanation  of  a  Box  and 

Whisker  Plot 


The  box  and  whisker  plot  below  (Figure  15)  provides  a  graphical  presentation  of  data  for  dis¬ 
playing  various  features  such  as  dispersion,  location,  and  skewness.  The  bottom  of  the  box 
corresponds  to  the  first  quartile  ( <2, )  and  indicates  the  value  of  the  variable  to  which  25%  of 

the  observations  are  less  than  or  equal.  Similarly  the  top  of  the  box  corresponds  to  the  third 
quartile  ( Q, ).  The  length  of  the  box  called  the  IQR  (interquartile  range)  is  a  measure  of  the 

dispersion  of  the  data.  A  line  within  the  box  indicates  the  median  (the  50th  percentile),  which 
is  the  statistic  that  indicates  the  center  of  the  distribution.  The  median  line  in  this  study  is 
drawn  with  a  bold  line  to  avoid  an  overlay  of  both  end  lines  of  the  box  and  the  medium.  Two 
whiskers  are  extended  from  the  box.  The  lower  whisker  starts  at  max{X(„,  Q,  -1.5(23 -Qi)) 

and  the  upper  whisker  ends  at  min{X,„),  2i +1-5(23 -2i)}.  where  -^(d  and  X;,,,  are  the  small¬ 
est  and  largest  value  of  observations.  Outliers  are  data  points  beyond  the  lower  and  upper 
whiskers,  and  they  are  plotted  with  o’s.  Extreme  values  are  data  points  beyond  the  outliers, 
and  they  are  plotted  with  asterisks  *. 


Extreme  values 

* 


Extreme  values  * 


Figure  15:  Explanation  of  a  Box  and  Whisker  Plot 
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